{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9cf9f373",
   "metadata": {},
   "source": [
    "# File Exploration\n",
    "\n",
    "In addition to core document processing capabilities, the `unstructured` library includes utilities for summarizing information about raw documents. We will cover how to use these utilities in this notebook. At the conclusion of this notebook, you should understand:\n",
    "\n",
    "- [Filetype detection in `unstructured`](#filetype)\n",
    "- [How to generate summary statistics about documents](#summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "59392a21",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pathlib\n",
    "\n",
    "DIRECTORY = os.path.abspath(\"\")\n",
    "EXAMPLE_DOCS_DIRECTORY = os.path.join(DIRECTORY, \"..\", \"..\", \"example-docs\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a6ce38f",
   "metadata": {},
   "source": [
    "## Filetype Detection <a id=\"filetype\"></a>\n",
    "\n",
    "The `unstructured` library includes a `detect_filetype` function that helps detect the type of an input file. To use the filetype detection utilities, you will need to install the `libmagic` library because `unstructured` uses this library under the hood for filetype detection. In addition to the MIME type from `libmagic`, the `unstructured` library uses the file extension and in some cases inspect the document to determine the document type. The following is an example of how to call `detect_filetype`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c6bd2f4a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<FileType.HTML: 50>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from unstructured.file_utils.filetype import detect_filetype\n",
    "\n",
    "filename = os.path.join(EXAMPLE_DOCS_DIRECTORY, \"example-10k.html\")\n",
    "detect_filetype(filename=filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c68e6a41",
   "metadata": {},
   "source": [
    "The output of `detect_filetype` is a `FileType` enum, which is defined in [`filetype.py`](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/file_utils/filetype.py). Check out the source file to see the full list of files that are supported by `detect_filetype`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b7dc938",
   "metadata": {},
   "source": [
    "## Summary Statistics <a id=\"summary\"></a>\n",
    "\n",
    "`unstructured` also provides utilities for summarizing the filetypes in a directory. You can use this utility for tasks such as counting by filetype and checking the average size of a file by filetype. The following example shows how to find a count of files by filetype in a directory and plot the results as a histogram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c53f054e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "MIME type was message/rfc822. This file type is not currently supported in unstructured.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "FileType.EML     4\n",
       "FileType.TXT     3\n",
       "FileType.HTML    2\n",
       "FileType.XML     2\n",
       "FileType.PDF     2\n",
       "FileType.JPG     2\n",
       "FileType.UNK     1\n",
       "FileType.DOCX    1\n",
       "FileType.PPTX    1\n",
       "FileType.XLSX    1\n",
       "Name: filetype, dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from unstructured.file_utils.exploration import get_directory_file_info\n",
    "\n",
    "file_info = get_directory_file_info(EXAMPLE_DOCS_DIRECTORY)\n",
    "file_info.filetype.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7e1b3300",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot: >"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiMAAAHzCAYAAADy/B0DAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/P9b71AAAACXBIWXMAAA9hAAAPYQGoP6dpAABDnElEQVR4nO3dd3gVZf7//9cJ5QAhCUYgoQQE6b1YCKIUI4hZMKt4IbB0WAt8hEVRUJRVhLgqgi5IUQGBpahL8cuiGEFsQaWFImtBgaAkwQIJNSC5f3/wI0sg5ZwAuc9Mno/rmj/OFM77vhjmvLhn7ns8xhgjAAAAS4JsFwAAAIo3wggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCppuwBfZGVl6cCBAwoJCZHH47FdDgAA8IExRkeOHFHVqlUVFJR3/4cjwsiBAwcUFRVluwwAAFAI+/fvV/Xq1fPc7ogwEhISIulsY0JDQy1XAwAAfJGRkaGoqKjs3/G8OCKMnLs1ExoaShgBAMBhCnrEggdYAQCAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWHVJYeS5556Tx+PRyJEj893v7bffVoMGDVSmTBk1bdpUq1evvpSvBQAALlLoMLJx40bNmjVLzZo1y3e/xMRE9erVS4MHD9bWrVsVFxenuLg47dy5s7BfDQAAXKRQYeTo0aPq06ePXnvtNV111VX57vvyyy/r9ttv1+jRo9WwYUNNmDBBrVq10rRp0wpVMAAAcJdChZFhw4YpNjZWMTExBe67YcOGi/br0qWLNmzYkOcxmZmZysjIyLEAAAB3KunvAUuWLNGWLVu0ceNGn/ZPTU1VREREjnURERFKTU3N85j4+Hg9/fTT/paWwzVj/nNJx/ti73OxV/w7AABwO796Rvbv368RI0boX//6l8qUKXOlatLYsWOVnp6evezfv/+KfRcAALDLr56RzZs36+DBg2rVqlX2ujNnzuiTTz7RtGnTlJmZqRIlSuQ4JjIyUmlpaTnWpaWlKTIyMs/v8Xq98nq9/pQGAAAcyq+ekVtvvVU7duxQUlJS9nLdddepT58+SkpKuiiISFJ0dLTWrl2bY11CQoKio6MvrXIAAOAKfvWMhISEqEmTJjnWBQcH6+qrr85e369fP1WrVk3x8fGSpBEjRqh9+/aaPHmyYmNjtWTJEm3atEmzZ8++TE0AAABOdtlnYE1OTlZKSkr257Zt22rRokWaPXu2mjdvrnfeeUcrVqy4KNQAAIDiyWOMMbaLKEhGRobCwsKUnp6u0NBQn45hNA0AAHb5+vvNu2kAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVX6FkRkzZqhZs2YKDQ1VaGiooqOj9d577+W5/7x58+TxeHIsZcqUueSiAQCAe5T0Z+fq1avrueeeU926dWWM0Ztvvqk777xTW7duVePGjXM9JjQ0VN9++232Z4/Hc2kVAwAAV/ErjHTr1i3H54kTJ2rGjBn64osv8gwjHo9HkZGRha8QAAC4WqGfGTlz5oyWLFmiY8eOKTo6Os/9jh49qpo1ayoqKkp33nmnvv766wL/7MzMTGVkZORYAACAO/kdRnbs2KHy5cvL6/Xq/vvv1/Lly9WoUaNc961fv77mzJmjlStXauHChcrKylLbtm31008/5fsd8fHxCgsLy16ioqL8LRMAADiExxhj/Dng1KlTSk5OVnp6ut555x29/vrr+vjjj/MMJOc7ffq0GjZsqF69emnChAl57peZmanMzMzszxkZGYqKilJ6erpCQ0N9qvOaMf/xab9Lsfe52Cv+HQAAOFVGRobCwsIK/P3265kRSSpdurTq1KkjSWrdurU2btyol19+WbNmzSrw2FKlSqlly5bavXt3vvt5vV55vV5/SwMAAA50yfOMZGVl5ejFyM+ZM2e0Y8cOValS5VK/FgAAuIRfPSNjx45V165dVaNGDR05ckSLFi3S+vXrtWbNGklSv379VK1aNcXHx0uSnnnmGbVp00Z16tTR4cOH9cILL2jfvn0aMmTI5W8JAABwJL/CyMGDB9WvXz+lpKQoLCxMzZo105o1a3TbbbdJkpKTkxUU9L/OlkOHDmno0KFKTU3VVVddpdatWysxMdGn50sAAEDx4PcDrDb4+gDM+XiAFQAAu3z9/ebdNAAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqv8LIjBkz1KxZM4WGhio0NFTR0dF677338j3m7bffVoMGDVSmTBk1bdpUq1evvqSCAQCAu/gVRqpXr67nnntOmzdv1qZNm9SpUyfdeeed+vrrr3PdPzExUb169dLgwYO1detWxcXFKS4uTjt37rwsxQMAAOfzGGPMpfwB4eHheuGFFzR48OCLtvXs2VPHjh3TqlWrste1adNGLVq00MyZM33+joyMDIWFhSk9PV2hoaE+HXPNmP/4/OcX1t7nYq/4dwAA4FS+/n4X+pmRM2fOaMmSJTp27Jiio6Nz3WfDhg2KiYnJsa5Lly7asGFDvn92ZmamMjIyciwAAMCdSvp7wI4dOxQdHa2TJ0+qfPnyWr58uRo1apTrvqmpqYqIiMixLiIiQqmpqfl+R3x8vJ5++ml/S3MlengAAG7nd89I/fr1lZSUpC+//FIPPPCA+vfvr127dl3WosaOHav09PTsZf/+/Zf1zwcAAIHD756R0qVLq06dOpKk1q1ba+PGjXr55Zc1a9asi/aNjIxUWlpajnVpaWmKjIzM9zu8Xq+8Xq+/pQEAAAe65HlGsrKylJmZmeu26OhorV27Nse6hISEPJ8xAQAAxY9fPSNjx45V165dVaNGDR05ckSLFi3S+vXrtWbNGklSv379VK1aNcXHx0uSRowYofbt22vy5MmKjY3VkiVLtGnTJs2ePfvytwQAADiSX2Hk4MGD6tevn1JSUhQWFqZmzZppzZo1uu222yRJycnJCgr6X2dL27ZttWjRIo0bN06PP/646tatqxUrVqhJkyaXtxUAAMCx/Aojb7zxRr7b169ff9G6e+65R/fcc49fRQEAgOKDd9MAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAq/wKI/Hx8br++usVEhKiypUrKy4uTt9++22+x8ybN08ejyfHUqZMmUsqGgAAuIdfYeTjjz/WsGHD9MUXXyghIUGnT59W586ddezYsXyPCw0NVUpKSvayb9++SyoaAAC4R0l/dn7//fdzfJ43b54qV66szZs365ZbbsnzOI/Ho8jIyMJVCAAAXO2SnhlJT0+XJIWHh+e739GjR1WzZk1FRUXpzjvv1Ndff53v/pmZmcrIyMixAAAAdyp0GMnKytLIkSN10003qUmTJnnuV79+fc2ZM0crV67UwoULlZWVpbZt2+qnn37K85j4+HiFhYVlL1FRUYUtEwAABLhCh5Fhw4Zp586dWrJkSb77RUdHq1+/fmrRooXat2+vZcuWqVKlSpo1a1aex4wdO1bp6enZy/79+wtbJgAACHB+PTNyzvDhw7Vq1Sp98sknql69ul/HlipVSi1bttTu3bvz3Mfr9crr9RamNAAA4DB+9YwYYzR8+HAtX75c69atU61atfz+wjNnzmjHjh2qUqWK38cCAAD38atnZNiwYVq0aJFWrlypkJAQpaamSpLCwsJUtmxZSVK/fv1UrVo1xcfHS5KeeeYZtWnTRnXq1NHhw4f1wgsvaN++fRoyZMhlbgoAAHAiv8LIjBkzJEkdOnTIsX7u3LkaMGCAJCk5OVlBQf/rcDl06JCGDh2q1NRUXXXVVWrdurUSExPVqFGjS6scAAC4gl9hxBhT4D7r16/P8XnKlCmaMmWKX0UBAIDig3fTAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKv8CiPx8fG6/vrrFRISosqVKysuLk7ffvttgce9/fbbatCggcqUKaOmTZtq9erVhS4YAAC4i19h5OOPP9awYcP0xRdfKCEhQadPn1bnzp117NixPI9JTExUr169NHjwYG3dulVxcXGKi4vTzp07L7l4AADgfB5jjCnswb/88osqV66sjz/+WLfcckuu+/Ts2VPHjh3TqlWrste1adNGLVq00MyZM336noyMDIWFhSk9PV2hoaE+HXPNmP/4tN+l2Ptc7BX/Dre0AwBQ/Pj6+31Jz4ykp6dLksLDw/PcZ8OGDYqJicmxrkuXLtqwYUOex2RmZiojIyPHAgAA3KlkYQ/MysrSyJEjddNNN6lJkyZ57peamqqIiIgc6yIiIpSamprnMfHx8Xr66acLWxoCjFt6d2iHb9zQBokeQ6AoFbpnZNiwYdq5c6eWLFlyOeuRJI0dO1bp6enZy/79+y/7dwAAgMBQqJ6R4cOHa9WqVfrkk09UvXr1fPeNjIxUWlpajnVpaWmKjIzM8xiv1yuv11uY0gAAgMP41TNijNHw4cO1fPlyrVu3TrVq1SrwmOjoaK1duzbHuoSEBEVHR/tXKQAAcCW/ekaGDRumRYsWaeXKlQoJCcl+7iMsLExly5aVJPXr10/VqlVTfHy8JGnEiBFq3769Jk+erNjYWC1ZskSbNm3S7NmzL3NTAACAE/nVMzJjxgylp6erQ4cOqlKlSvaydOnS7H2Sk5OVkpKS/blt27ZatGiRZs+erebNm+udd97RihUr8n3oFQAAFB9+9Yz4MiXJ+vXrL1p3zz336J577vHnqwAAQDHBu2kAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVX6HkU8++UTdunVT1apV5fF4tGLFinz3X79+vTwez0VLampqYWsGAAAu4ncYOXbsmJo3b67p06f7ddy3336rlJSU7KVy5cr+fjUAAHChkv4e0LVrV3Xt2tXvL6pcubIqVKjg93EAAMDdiuyZkRYtWqhKlSq67bbb9Pnnn+e7b2ZmpjIyMnIsAADAna54GKlSpYpmzpypf//73/r3v/+tqKgodejQQVu2bMnzmPj4eIWFhWUvUVFRV7pMAABgid+3afxVv3591a9fP/tz27Zt9cMPP2jKlClasGBBrseMHTtWo0aNyv6ckZFBIAEAwKWueBjJzQ033KDPPvssz+1er1der7cIKwIAALZYmWckKSlJVapUsfHVAAAgwPjdM3L06FHt3r07+/OePXuUlJSk8PBw1ahRQ2PHjtXPP/+s+fPnS5KmTp2qWrVqqXHjxjp58qRef/11rVu3Th988MHlawUAAHAsv8PIpk2b1LFjx+zP557t6N+/v+bNm6eUlBQlJydnbz916pQefvhh/fzzzypXrpyaNWumDz/8MMefAQAAii+/w0iHDh1kjMlz+7x583J8fvTRR/Xoo4/6XRgAACgeeDcNAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsMrvMPLJJ5+oW7duqlq1qjwej1asWFHgMevXr1erVq3k9XpVp04dzZs3rxClAgAAN/I7jBw7dkzNmzfX9OnTfdp/z549io2NVceOHZWUlKSRI0dqyJAhWrNmjd/FAgAA9ynp7wFdu3ZV165dfd5/5syZqlWrliZPnixJatiwoT777DNNmTJFXbp08ffrAQCAy1zxZ0Y2bNigmJiYHOu6dOmiDRs25HlMZmamMjIyciwAAMCd/O4Z8VdqaqoiIiJyrIuIiFBGRoZOnDihsmXLXnRMfHy8nn766StdGgBYdc2Y/1zx79j7XOwV/fPd0AaJdvjqSrUhIEfTjB07Vunp6dnL/v37bZcEAACukCveMxIZGam0tLQc69LS0hQaGpprr4gkeb1eeb3eK10aAAAIAFe8ZyQ6Olpr167NsS4hIUHR0dFX+qsBAIAD+B1Gjh49qqSkJCUlJUk6O3Q3KSlJycnJks7eYunXr1/2/vfff79+/PFHPfroo/rmm2/06quv6q233tLf/va3y9MCAADgaH6HkU2bNqlly5Zq2bKlJGnUqFFq2bKlnnrqKUlSSkpKdjCRpFq1auk///mPEhIS1Lx5c02ePFmvv/46w3oBAICkQjwz0qFDBxlj8tye2+yqHTp00NatW/39KgAAUAwE5GgaAABQfBBGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYRRgAAgFWEEQAAYBVhBAAAWEUYAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAAIBVhBEAAGAVYQQAAFhFGAEAAFYVKoxMnz5d11xzjcqUKaMbb7xRX331VZ77zps3Tx6PJ8dSpkyZQhcMAADcxe8wsnTpUo0aNUrjx4/Xli1b1Lx5c3Xp0kUHDx7M85jQ0FClpKRkL/v27bukogEAgHv4HUZeeuklDR06VAMHDlSjRo00c+ZMlStXTnPmzMnzGI/Ho8jIyOwlIiLikooGAADu4VcYOXXqlDZv3qyYmJj//QFBQYqJidGGDRvyPO7o0aOqWbOmoqKidOedd+rrr7/O93syMzOVkZGRYwEAAO7kVxj59ddfdebMmYt6NiIiIpSamprrMfXr19ecOXO0cuVKLVy4UFlZWWrbtq1++umnPL8nPj5eYWFh2UtUVJQ/ZQIAAAe54qNpoqOj1a9fP7Vo0ULt27fXsmXLVKlSJc2aNSvPY8aOHav09PTsZf/+/Ve6TAAAYElJf3auWLGiSpQoobS0tBzr09LSFBkZ6dOfUapUKbVs2VK7d+/Ocx+v1yuv1+tPaQAAwKH86hkpXbq0WrdurbVr12avy8rK0tq1axUdHe3Tn3HmzBnt2LFDVapU8a9SAADgSn71jEjSqFGj1L9/f1133XW64YYbNHXqVB07dkwDBw6UJPXr10/VqlVTfHy8JOmZZ55RmzZtVKdOHR0+fFgvvPCC9u3bpyFDhlzelgAAAEfyO4z07NlTv/zyi5566imlpqaqRYsWev/997Mfak1OTlZQ0P86XA4dOqShQ4cqNTVVV111lVq3bq3ExEQ1atTo8rUCAAA4lt9hRJKGDx+u4cOH57pt/fr1OT5PmTJFU6ZMKczXAACAYoB30wAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMAqwggAALCKMAIAAKwijAAAAKsIIwAAwCrCCAAAsIowAgAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrChVGpk+frmuuuUZlypTRjTfeqK+++irf/d9++201aNBAZcqUUdOmTbV69epCFQsAANzH7zCydOlSjRo1SuPHj9eWLVvUvHlzdenSRQcPHsx1/8TERPXq1UuDBw/W1q1bFRcXp7i4OO3cufOSiwcAAM7ndxh56aWXNHToUA0cOFCNGjXSzJkzVa5cOc2ZMyfX/V9++WXdfvvtGj16tBo2bKgJEyaoVatWmjZt2iUXDwAAnK+kPzufOnVKmzdv1tixY7PXBQUFKSYmRhs2bMj1mA0bNmjUqFE51nXp0kUrVqzI83syMzOVmZmZ/Tk9PV2SlJGR4XOtWZnHfd63sPypp7Dc0A43tEGiHb5yQxsk2uErN7RBoh2+8rcN5/Y3xuS/o/HDzz//bCSZxMTEHOtHjx5tbrjhhlyPKVWqlFm0aFGOddOnTzeVK1fO83vGjx9vJLGwsLCwsLC4YNm/f3+++cKvnpGiMnbs2By9KVlZWfr999919dVXy+PxXJHvzMjIUFRUlPbv36/Q0NAr8h1XmhvaILmjHW5og0Q7Aokb2iC5ox1uaINUNO0wxujIkSOqWrVqvvv5FUYqVqyoEiVKKC0tLcf6tLQ0RUZG5npMZGSkX/tLktfrldfrzbGuQoUK/pRaaKGhoY4+uSR3tEFyRzvc0AaJdgQSN7RBckc73NAG6cq3IywsrMB9/HqAtXTp0mrdurXWrl2bvS4rK0tr165VdHR0rsdER0fn2F+SEhIS8twfAAAUL37fphk1apT69++v6667TjfccIOmTp2qY8eOaeDAgZKkfv36qVq1aoqPj5ckjRgxQu3bt9fkyZMVGxurJUuWaNOmTZo9e/blbQkAAHAkv8NIz5499csvv+ipp55SamqqWrRooffff18RERGSpOTkZAUF/a/DpW3btlq0aJHGjRunxx9/XHXr1tWKFSvUpEmTy9eKy8Dr9Wr8+PEX3R5yEje0QXJHO9zQBol2BBI3tEFyRzvc0AYpsNrhMaag8TYAAABXDu+mAQAAVhFGAACAVYQRAABgFWEEAABYRRgBAABWEUYAwIFSUlJslwBcNoSRXGzfvl2lS5e2XUa+ateurd9++812GVfcmTNndODAAdtlXLIff/xRnTt3tl0GHOLCN51fKCUlRR06dCiaYuAKu3btKnCfhQsXFkEluSOM5MIYozNnztguI1979+4N+Bovh507dyoqKsp2GZfsyJEjF70WIdDUqFEjR8CdNm1akbzy/HI7ceKEVq1alf353Is3zy2jR4/WyZMnLVZYsLlz52rixIm5bjsXRCpVqlTEVfnvySef1B9//JHn9uTkZN12221FWFHhvPHGG/luP3LkiIYMGVJE1RRO69at9eKLLyq3qcXS0tLUvXt3PfDAAxYqO4swAkCS9NNPP+UIuI8//rh+/fVXixUVzptvvqlZs2Zlf542bZoSExO1detWbd26VQsXLtSMGTMsVliwd999V5MmTbqoztTUVHXs2FHh4eF6//33LVXnuzfffFPXX3+9du7cedG2WbNmqUmTJipZMiBfHp/DqFGj9Kc//UmpqakXbVuzZo0aN26sjRs3WqjMdwsXLtTzzz+vW265RT/88EOO9Y0aNdLhw4e1detWewUaXCQpKckEBQXZLiNfHo/HzJ8/36xcuTLfxemc8HfhCye0w+PxmLS0tOzP5cuXNz/88IPFigqnXbt25t13383+fGE7FixYYNq0aWOjNL+sWrXKeL1es3jxYmOMMSkpKaZBgwbmhhtuMBkZGZar8016errp27ev8Xq9ZtKkSebMmTNm37595tZbbzWhoaFm1qxZtkv0yZ49e0yHDh1MeHi4WbRokTHGmIyMDDNo0CBTqlQpM3bsWHPq1CnLVRYsLS3NxMXFmeDgYPPCCy+Y7t27m7Jly5rJkyebrKwsq7UVyzCSnp6e7/Lpp5864oejoCXQ2+ALJ/yI+8IJ7XBLGImMjDR79uzJ/lyxYsUcn7/99lsTGhpa9IUVwr/+9S9TpkwZM3fuXNOwYUNz3XXXmcOHD9suy28rVqwwERERpnnz5iY0NNTExMSYvXv32i7Lb1OmTDHBwcEmNjbW1KhRwzRq1Mh89dVXtsvyW+/evY3H4zHly5c327dvt12OMcaYwO8fuwIqVKggj8eT53ZjTL7bA0VqaqoqV65su4xLsn379ny3f/vtt0VUyaVp2bJlvufM8ePHi7Cawnv99ddVvnx5SdIff/yhefPmqWLFijn2eeihh2yU5rPDhw8rMzMz+/Mvv/ySY3tWVlaO7YGsd+/eOnz4sAYPHqxWrVrpww8/VFhYmO2y/NamTRs1bdpUa9euVXBwsMaNG6eaNWvaLstv9913nz755BOtWLFCwcHBWrVqlZo2bWq7LJ8dOnRIw4YN08qVKzVmzBgtXbpUvXr10vz589WqVSurtRXLMPLRRx/ZLuGSOSEs+aJFixbyeDy5PlR1br0T2hoXF2e7hEtWo0YNvfbaa9mfIyMjtWDBghz7eDyegA8j1atX186dO1W/fv1ct2/fvl3Vq1cv4qr8c2G4LVWqlA4fPqyOHTvm2G/Lli1FXZrfFi9erOHDh6tFixb673//qzfeeEOdO3fWgw8+qPj4eJUpU8Z2iT75/PPPNXDgQJUsWVLvv/++Xn/9dUVHR2vixIkaMWKE7fIKtGrVKg0dOlQ1atTQ5s2b1aBBAz3xxBN65JFHFB0drUcffVTjx4+39gwPb+11qKCgIFf0jOzbt8+n/Zz4vyjYMWLECH344YfavHnzRT90J06c0HXXXaeYmBi9/PLLlios2NNPP+3TfuPHj7/ClVyau+++W2vWrFF8fLz+7//+L3t9YmKiBg4cKEmaN2+eoqOjbZXok4cffljTpk3T8OHDNXHixOzzaunSpRo+fLgaN26suXPnqlatWpYrzZvX69X48eM1ZswYBQXlHLuSkJCgIUOG6KqrrlJSUpKV+ggjDjVw4EC98sorCgkJyXOf06dPq1SpUkVYFWBfWlqaWrRoodKlS2v48OGqV6+epLO3/KZNm6Y//vhDW7duVUREhOVK3e+mm27SvHnzVLdu3Yu2nThxQmPGjNGMGTN06tQpC9X5rk6dOpo7d65uvvnmi7alpaXpr3/9q9atW6cjR45YqM4327dvV7NmzfLcnpGRob/97W8FDmO+UoplGClRooRP+wXyPB59+/bV9OnTFRoamuv2TZs2acCAAbkOqQskycnJPu1Xo0aNK1zJpenUqZNP+61bt+4KV3JpsrKyNG/ePC1btkx79+6Vx+NRrVq11KNHD/Xt29cRt8wkac+ePXrggQeUkJCQfQvQ4/Hotttu06uvvqratWtbrrB4yMrKuuh/4Rf65JNPdMsttxRRRYVz/PhxlStXLt99FixYoL59+xZRRe5TLMNIUFCQatasqf79+6tly5Z57nfnnXcWYVX+ad26tdLS0vTGG2+oS5cu2etPnz6tp556SpMnT9agQYM0c+ZMi1UW7PxgeP6PxvnrPB5PQAdD6X/nVGxsbL69UVOmTCnCqvxjjFG3bt20evVqNW/eXA0aNJAxRv/973+1Y8cOde/eXStWrLBdpl9+//137d69W9LZ/92Gh4dbrsg3HTt2LDD4eTyegJ9I78yZM/r6669Vt25dlS1bNse248ePa/fu3WrSpEmBgcW22rVra+PGjbr66qttl1Jo3333nQ4fPqwbbrghe93atWv17LPP6tixY4qLi9Pjjz9urb5i+QDrV199pTfeeEMvv/yyatWqpUGDBqlPnz666qqrbJfmsy+//FLPPPOMunXrpoEDB2ry5Mn65ptv1L9/fx09elSrVq1yxPTjHo9H1atX14ABA9StWzdHTICUm3/84x+aO3eu3n77bfXp00eDBg1SkyZNbJfll3nz5umTTz7R2rVrL3pQct26dYqLi9P8+fPVr18/SxX6bu/evUpISNDp06d1yy23OO7vokWLFnluO3LkiBYtWuSIEUELFizQtGnT9OWXX160rXTp0ho0aJBGjhypv/zlLxaq850bZrx+7LHH1LRp0+wwsmfPHnXr1k0333yzmjVrpvj4eJUrV04jR460U2DRjyYOHCdOnDALFiwwnTp1MuXKlTM9e/Y0H3zwge2y/LJx40bTuHFjU6VKFVOqVCkzaNAgk56ebrssn6WkpJjnnnvO1K9f30RERJiHH37Y7Nq1y3ZZhZaYmGiGDBliQkNDzfXXX29mzJjhmL+P2267zcTHx+e5feLEiaZz585FWFHhrFu3zpQrVy57vp1SpUqZBQsW2C7rkp0+fdpMnTrVVKpUydSpUyd7MrRA1q5du3zrXLp0qbn55puLsKLCuXAOHieqXr26SUxMzP48YcIE07x58+zPr7/+eo7PRa1Yh5Hz/fjjj6Zjx44mKCjI/Pbbb7bL8dmOHTtMixYtTLly5UxwcLCjL7qffvqpGTRokAkJCTE33nijmT17tjlz5oztsgrl2LFjZt68eeb66683wcHBjggkERERZuvWrXlu37Jli4mIiCi6ggrppptuMnfeeac5cOCA+f33382DDz5oqlSpYrusS7Jw4UJTu3ZtU6VKFTN9+nRz+vRp2yX5pFKlSjkmnLvQjz/+aCpWrFh0BRWSG2a8LlOmjElOTs7+3KlTJzNu3Ljsz7t37zZhYWEWKjur2IeR/fv3mwkTJphrr73WVKlSxTz22GOO+IeelZVlJk2aZLxerxkwYIA5dOiQmT59uilfvrz585//bA4ePGi7xEJLTU11ZDA836effmoGDhxoypcvb2688UZz/Phx2yUVqFSpUubAgQN5bv/5559N6dKli7CiwgkLCzNff/119udjx46ZEiVKmF9//dViVYXz3nvvZc9a+swzz5ijR4/aLskv5cqVM9u2bctz+7Zt20y5cuWKsKLCccOM11WrVjVffvmlMcaYM2fOmNDQULNq1ars7bt27bI6M3FgPzV0hZw6dUpLly5V586dVbduXW3ZskVTp07V/v379dxzzzniuYU2bdron//8p95++23NnTtXFSpU0IMPPqht27bp119/VaNGjbR06VLbZfolMTFRQ4YMUb169XT06FFNnz5dFSpUsF2Wzw4cOKBJkyapXr166tGjh8LDw/Xll1/qiy++uOjhvUB05syZfM/9EiVK5PsG1kCRkZGRY9bYcuXKqWzZskpPT7dYlX+++uordezYUX/+85/VsWNH/fDDD3ryyScVHBxsuzS/1K1bV4mJiXlu/+yzz3Id9huIUlNTlZWVlecS6M+UdOjQQRMmTND+/fs1depUZWVlqUOHDtnbd+3apWuuucZafYH/q3sFVKlSRSEhIerfv79effXV7InDjh07lmO/vIbNBoJatWrpvffeu2h0QO3atfXxxx9r6tSpGjx4sHr27GmpQt+kpKRo/vz5mjt3rg4dOqQ+ffro888/d9wDh3fccYc++ugjde7cWS+88IJiY2MdEWrPZ4zRgAED5PV6c93uhAcmz1mzZk2OadOzsrK0du3aHEPdu3fvbqM0n7Rp00Zly5bV/fffr1q1amnRokW57hfos+H27t1b48aNU9u2bS+a42Lbtm166qmn9Oijj1qqzndOGdKen4kTJ+q2225TzZo1VaJECb3yyis5wu2CBQt8nqLgSii2Q3vPye0kMw4YTpqcnKyoqKh8/5F8//33Af+/jlKlSqlatWrq37+/unfvnuew2Pwm6wkEQUFBqlKliipXrpzv30kgT999bkbMgsydO/cKV3JpfBkmGuj/vq+55hqfhvb++OOPRVRR4Zw+fVqdO3fWZ599ppiYGDVo0ECS9M033+jDDz/UTTfdpISEhICfnNEtM17/8ccf+vrrr1WpUiVVrVo1x7Zt27YpKirK2vD3YhlGPv74Y5/2a9++/RWupPBKlCihlJQUx//jyC0YXnhKBvoPh+Se6buBy+306dOaMmWKFi1apO+//17GGNWrV0+9e/fWyJEjVbp0adslFsiXGa+d7scff9T999+vDz74wMr3F8sw4gZuSeq8myawnJuf49SpU+rQoYMaN25suyTAujNnzujFF1/Uu+++q1OnTunWW2/V+PHjHfEsmK+2bdumVq1aWfuPn7Nual8mb731luLi4rIT+U8//aSqVatm/y/9+PHjmjZtWsDfy3TDfcw333xTjzzySIFTLQe6Xbt2qVGjRvnus3DhwoCe3Omjjz7Sn/70J504cUKSVLJkSc2ZMyega87Nu+++W+A+JUuWVGRkpJo0aRKQ/zMfNWpUruvDwsJUr1493XXXXXk+2xOITpw4oYSEBH333XeSpPr16ysmJsYxP+aTJk3S3//+9+yaX375ZR08eFBz5syxXZprFMuekQtvcYSGhiopKSn7fRVpaWmqWrVqQN8aCAoK0l//+tcCf8RfeumlIqqocNxyu6ls2bKaMGGCHn744YtCYlpamoYOHaqPPvoooF+k1a5dO1WsWFEzZsxQmTJlNG7cOC1fvlwHDhywXZpf/JlaPDIyUkuXLs31BWg2XTgD7jmHDx/W7t27FRERoXXr1gX8O5uks+FwyJAh+vXXX3Osr1ixot544w1169bNUmW+q1u3rh555BHdd999kqQPP/xQsbGxOnHiRMBPZe8r2z0jxTKMXHiLIyQkRNu2bXNcGImOjs73f3UejyfgX8zmlttN//73v/XAAw+ofv36mjdvnq699lpJZ3tDRowYocaNG2vOnDmqU6eO5UrzVqFCBSUmJmb38Bw/flyhoaFKS0tz9Ds5cmOMUVpamp599lklJiYG9IPFF8rIyFCfPn0UEhKS5yibQJGYmKgOHTqoe/fuevjhh9WwYUNJZ3sSJ0+erFWrVunjjz9WmzZtLFeaP6/Xq927dysqKip7XZkyZbR7925Vr17dYmWXD2HEAreEETf8iAcFBSktLU2VKlWyXcolO3jwoO677z4lJCTo73//uz799FMlJCTo2Wef1d/+9reAv62W2zl14b8Nt9m7d68aNGigkydP2i7FL1999ZXuuecen5+5suWOO+5QVFSUZs2alev2++67T/v379fq1auLuDL/lChRQqmpqTmuUyEhIdq+fbtq1aplsTLftWzZMt9r0PHjx/X999/zzAj8E+g/bP6oV69ege35/fffi6iawqtcubKWL1+uPn366NFHH1VwcLC+/PJLNW3a1HZpPnP6/By+SElJ0enTp1WjRg1dc801SktLs12S3ypWrOiIfxNffPGF/vGPf+S5fdiwYQE9avGc3ObgOXnypO6///4cc3UsW7bMRnk+iYuLs11CvoptGDn/onvhBffw4cMWK/ONmzq0nn766Rw/gE516NAhDRs2TCtXrtSYMWO0dOlS9erVS/Pnz1erVq1sl+eT/v37X7Tu3H1yyRnDrAvSqVMnfffdd9ntcOK598UXX2TfCgxkJ06cyHfyyLCwMEf0SuX278JpD3YH+rQCxTaMXHhynX/BlQK/52Hu3LmOvIjm5t5773X87aZVq1Zp6NChqlGjhjZv3qwGDRroiSee0COPPKLo6Gg9+uijGj9+fEDPypqVlWW7hCIxf/58HT9+3HYZ+dq+fXuu69PT07V582ZNmjQp4H9cpLMPfq5bty7PCfXWrl0b8BMzSoE/0Z8vAn7EXxG/CwdF5MCBA2bfvn22yyhQUFCQ41/NbYwxpUuXNhMnTsz1LcMffPCBqVGjhtXXc8NZzr14LbcXslWqVMnEx8ebrKws22UW6KWXXjLh4eHmP//5z0XbVq1aZa6++mozefJkC5X5b8+ePWb27Nlm2rRpZufOnbbL8VuZMmXMCy+8kOt5k5qaarp162bKly9vobKzCCMu1aBBg4B/i6QxZy+6bggj+b2Z1Bhj0tPTzaBBg4qomivDKQH3fIcOHTKvvfaaGTNmTPYboDdv3mx++ukny5Xlb+/evbkuv//+u+3S/HLmzBnTo0cP4/F4TIMGDcyf//xnExcXZ+rXr2+CgoLMXXfdlWuADzTr1q0z5cqVyw6EpUqVMgsWLLBdll/eeecdU6lSJdOuXTuze/fu7PULFiww4eHh5uabbzbff/+9tfqK5Wiagpz/gJtTbdy4UcePH3fEw2FwhoYNG+Z41iLQbd++XTExMQoLC9PevXv17bffqnbt2ho3bpySk5M1f/582yUWG0uXLtXixYuzJz2rV6+e7r33Xt17772WK/ONW+bgCeQRf4SRXDjtoutkd911l0/7BfJT6r4g4Ba9mJgYtWrVSs8//3yOIcqJiYnq3bu39u7da7vEQnPD+eQkbpuDp0+fPlq8eLGCg4OVmJgYECP+AvdpOouc8IDb+Q4fPqx33nlHP/zwg0aPHq3w8HBt2bJFERERqlatmu3y8nXhQ7iLFi1St27dXPdCqgtHcDjR9ddfb7sEv2zcuDHX+S2qVaum1NRUCxVdPk47n9LT05WQkKC9e/fK4/Godu3auvXWW/MdaRNIMjIyVLFixezP5cqVU9myZZWenu6oMBLII/4II7lw0kX3wq7ooUOHKjw8XMuWLXNEV/SFT6m/8847ev755103yRYBt+h5vV5lZGRctP67775z/CR7TjqfFi5cqOHDh1/0dxEWFqaZM2eqZ8+elirzj9Pn4An4EX/WnlYJEE59wO2cW2+91YwePdoYY0z58uXNDz/8YIwx5vPPPzc1a9a0WFnhnN8G2LFt2zZTqVIlU6dOHVOyZMnsv48nnnjC9O3b13J1vhs8eLCJi4szp06dMuXLlzc//vij2bdvn2nZsqUZMWKE7fKKhc2bN5uSJUua/v37m6SkJHPy5Elz4sQJs3nzZtO3b19TqlQpk5SUZLvMAuU2qunCJdAHDAT6iL9iHUbccNENDQ3NfjL6/B/yvXv3Gq/Xa7O0QnFDGCHgBobDhw+bmJgYU6FCBVOiRAkTFRVlSpUqZW655RZz9OhR2+X5zMnn04ABA0yPHj3y3H733XebgQMHFmFFxVegj/gr1rdpRo0apQEDBmQ/4HbOHXfcod69e1uszHdu7op2IqffNpPc86xFWFiYEhIS9Nlnn2n79u06evSoWrVqpZiYGNul+czp59Pnn3+uV199Nc/t999/vx588MEirKj4atasWb7bQ0ND9cYbbxRRNRcr1mHEDRfd7t2765lnntFbb70l6ezMscnJyXrsscd09913W66uYO+++26Oz7ndh5UC+17s+Qi4gaddu3Zq166d7TIKxenn04EDB1SvXr08t9erV08///xzEVZ0ZbhhdJP1NljrkwkAlSpVMlu2bDHG5OyO/uCDD0z16tVtluYzp3dFu+Fe7PnccNvMTc9afPjhhyY2NtbUrl3b1K5d28TGxpqEhATbZfnM6edTQZMapqamOurfd16cMslkfmy3oVj3jDi9V0Fyfle0296H4oZehcmTJ6tHjx6qXLmyTpw4ofbt2ys1NVXR0dGaOHGi7fJ89uqrr2rEiBHq0aOHRowYIensC+buuOMOTZkyRcOGDbNcYcHccD5dOArlfE54KakvnDS6KS+221CsJz1LT09Xjx49tGnTJh05ckRVq1bNvuiuXr06x6uhAV8MGTJEv/32m9566y2Fh4dr+/btKlGihOLi4nTLLbdo6tSptkv0mVMD7jnVq1fXmDFjNHz48Bzrp0+frkmTJjni9oDTz6egoKAC93HDm6Bx6Yp1GDnH6RfdtWvXasqUKfrvf/8r6ewMsiNHjnRcO3Jj/T6mnwi4gaN8+fJKSkpSnTp1cqz//vvv1bJlSx09etRSZb7jfAo8bpiDJxDbQBhxuPO7oqOjoyWd7Yp+5513HNMVnR+nTs1PwLWvd+/eatmypUaPHp1j/YsvvqhNmzZpyZIllirzn9PPJ7dww/uOArUNxT6MOP2i64au6Pw47X0obuCWgPvss8/qxRdf1E033ZSjHZ9//rkefvjhHFORP/TQQ7bKLNac1vPphvcdBWobinUYccNF1w1d0W5DwA0MtWrV8mk/j8ejH3/88QpXU3hOP5/y47Sez7CwMG3ZskXXXnttjh/yffv2qX79+jp58qTtEgsUqG0o+OkiF5s0aZKmTJmixYsX66GHHtJDDz2kRYsWacqUKZo0aZLt8nzSvXt3LV++/KL1K1eu1J/+9CcLFRXe4cOH9frrr2vs2LH6/fffJUlbtmxxzI+fdDbg3n777QoJCdGIESM0YsQIhYaG6o477tD06dNtl+eTw4cP6/bbb79ofefOnZWenm6hosLZs2ePT0sgBxE3nE/5mT9/vtatW2e7DJ+5YXRTwLbB1pjiQBAcHGy+//77i9Z/9913Jjg42EJF/pswYYIJCwszd9xxh5kwYYKZMGGCiY2NNRUqVDATJkwwL7/8cvYSyNwwNb8xxlSrVs3885//vGj9tGnTTNWqVS1U5L9evXqZ559//qL1L7zwgunZs6eFigpn3bp1tku4ZG44n9zEDXPwBGobivVtGjc84OaWruhAvY/pLzfcNnPLsxZer1fVq1fXwIED1b9/f0VFRdkuyW9uOJ/OCcQRHP5yw+imQG1DsQ4jbrnoukGg3sf0FwE3cPz6669asGCB3nzzTX399dfq1KmTBg8erLi4OJUuXdp2eT5xw/kkBe4IjsJyw+imQGtDsQ4jbrjofvTRR+rYsaPtMi5Z5cqVtWbNGrVs2TJHGElISNCgQYO0f/9+2yX6hIAbmLZs2aK5c+dq8eLFks7+yA8ePFjNmze3XFn+3HI+uaXnE1dOsQ4jbuCGrmjJ+TNNnkPADVwHDhzQ7Nmz9dxzz6lkyZI6efKkoqOjNXPmTDVu3Nh2eblyw/kkuafnU3LH6KaAbIO1p1UCgBsecPvll1/MSy+9ZJo3b25KlixpOnfubJYuXWoyMzNtl+YXp7/wz01Kly5tateubSZMmGCSk5Ntl3NJTp06Zd5++23TtWtXU7JkSdOmTRvz2muvmaNHj5o9e/aYPn36mIYNG9ou0/Xc8FJSY4yZPn26KVmypLn33nuzBwb06tXLlCpVykybNs12eT4J1DYU6zDipouuMcZs3rzZDB8+3Fx99dXm6quvNv/3f/9nkpKSbJfll08//dRMnz7d/OMf/3DU21XPIeDa17FjR3Po0KHsfwvh4eFmxIgRZseOHRftm5KSYjwej4UqfeOG88mYwB3B4S83jG4K1DYU6zDi9Itubn7++Wczfvx44/V6TXBwsClRooRp166d2blzp+3SigUCrn1BQUEmLS3NdOrUySxatMicPHkyz31Pnz5t1q9fX4TV+cct55Nbej7dMB1EoLahWIeR8znxonuOW7qiP/zwQxMbG2tq165tateubWJjYx3XO0LAtc/j8Zi0tDTbZVwWbjufnN7z6YY5eAK1DTzAeh4nPeDWqVMnLVu2TE8++aQWL14sY4z69u2rIUOGqEmTJjn2TU1NVdWqVZWVlWWp2oK5YWr+Czl1BIcknT59WitXrtScOXOUkJCg6667ToMHD1avXr30yy+/aNy4cdqyZYt27dplu9SLBAUFad26dQoPD893v2bNmhVRRZeHk88nt3DD6KaAbYO1GBQgnNqr4KauaGMC9z7mpXJSr4JbnrXweDwmKCjIeDyei5Zz64OCgmyXWShOOp8u5Iaez2uuucanpVatWrZLzVOgtqFYhhE3XHTd1BVtTODexywMAq5dHo/HbNy40ezduzffxSmcej6dL1BHcCBwFMsw4oaLrsfjMR999JHZtm1bvotTBOp9TF8RcAOHG9rhhvPpfG7p+XTD6KZAbUOxfGYkKChIqampqly5su1SCi0oKEgej0e5/fWdW+/xeBzzau6AvY/poxIlSiglJUW9evXSkCFDdNddd8nr9ea67x9//KHPP/9c7du3L+Iq8+eWZy3c8O/bDefT+dzyjh03TDIZqG0otmHE6RfdoKAgffXVVwW+8rlmzZpFVNGlcfpMk274AXRLwO3YsaOWL1+uChUq2C6l0NxwPp3PLe/YccP7jgK1DcU2jDj9ouu2i5XTEXADk1PfFOuG8+l8Tu/5zI0bRjcFUhuKbRhx+kXXbWHE6e9DIeAGHie/KdYN59P5nN7zmRcnTQeRl0BpQ8ki+6YAU6NGDUdfdNu3b++YbkFf3H777QF5H9MfX375ZYEBF0Vn1KhRGjBgQPabYs+544471Lt3b4uV+cZN59OePXtsl3DZ5DYHz7Rp03LMwXPPPfcE5Bw85wRkG4r6idlA4Ian7c936NAh89prr5kxY8aY3377zRhzdkbZn376yXJlvnP6TJNuOKc6dOhgDh06ZLuMyyY0NNTs3r3bGJPz5Wx79+41Xq/XZmkFcsP5dL5AHcHhKzeMbgr0NhTLMOKmi+62bdtMpUqVTJ06dUzJkiWzL7hPPPGE6du3r+XqCseJU/O77cfDDQHXyW+Kddv55PR37LhhOohAb0OxDCPnc/pF99ZbbzWjR482xuS84H7++eemZs2aFiu7NE6baZKAG3ic/KZYN51PxtDzGQgCvQ3F8gHWc5z8gNs5YWFh2rJli6699lqFhIRo27Ztql27tvbt26f69evr5MmTtkv0mZPfh3I+p47gOCcmJkatWrXKftbi3DmVmJio3r17a+/evbZL9El6erp69OihTZs26ciRI6patapSU1MVHR2t1atXKzg42HaJPnH6+XShQBrB4Ss3jG4K+DbYTkM2uaFXwcld0cYE/n1Mf7mhV8HJz1rkxslvinXD+ZQbp/V8uuF9R4HehmI7mkaSNm7cqFmzZl20vlq1akpNTbVQkf+6d++uZ555Rm+99Zaks0PjkpOT9dhjj+nuu++2XF3BPv74Y506dUq7du3SP//5z3xnmqxYsaI++uijIq7QP04fwSGdnaExIyPjovXfffedI0d3tGvXTu3atbNdRqG44Xw6JyBHcPjBDaObArkNxTqMuOGiO3nyZPXo0UOVK1fWiRMn1L59++yu6IkTJ9our0Dm/79LuHbt2gL3LVmyZEBPeS0RcG175ZVXfN7XCZNrOf186tSpk5YtW6Ynn3xSixcvljFGffv21fPPP68mTZpk7xccHKwXX3xRVatWtVht/pw+HYQU2G0o1mHEyRfdc8LCwpSQkKDPPvtM27dv19GjR9WqVSvFxMTYLs1nu3btKvDCGsj3Ys9HwLVrypQpPu3n8XgcEUacfj65recTV06xfoDVLQ+4OZnbZpocMmSIfvvtN7311lsKDw/X9u3bVaJECcXFxemWW27R1KlTbZfoMycHXLdw+vnklll93fC+o0BvQ7EOI+c47aLrpq5oN0zNfz4CLi4np59PAT+CoxDcMLopENtAGHEgN73nwS3/c7oQAdeOUaNGacKECQoODtaoUaPy3fell14qoqoundPOp3Pc1vPphukgArUNxS6MuOWi6xZuDSNO45aA27FjR7344otq2bKlbr311jz383g8WrduXRFWVjy5refTDXPwBGobil0YcctF1y0C/T6mLwi4gaVEiRJKSUnJDrg9e/bUK6+8ooiICMuV+cZN55Pb/rPhhkkmA7UNxW40jRveHummrujzn54PxPuYvnDbCA6nu/D/V++9956OHTtmqRr/cT4FLqePbpICtw3FLoy4wdatW/XNN9+oZcuW2rp1a577eTyeIqzq0lx4H3Po0KEKDw/XsmXLAv5eLAE3sDmt89cN59M57du3V+nSpW2Xcdm4YTqIQG1DsbtN45aLrtO7oi8UqPcxiws3PWtRokQJpaamZv8vLyQkRNu3b/f5Fi2uDKf2fJ7P6aObpMBtQ7HrGXFLr4LTu6Iv5OSZJt0QcD/66KPsgHvu1plTA64xRgMGDMieXOvkyZO6//77L7rILlu2zEZ5BXLD+XQhJ/d8ns8Nk0wGahuKXRhx00X3fE7v4ArU+5i+IOAGlv79++f4/Je//MVSJYXjlvPpfG56x47k7PcdnRNobSh2YURyx0XX4/FcdDFy0sXpQoF6H9MXBNzAMnfuXNslXBI3nk9O7vl0w+gmJ7ShWIaRCznxouv0rugLOfl9KBIBF5eXG86n8zm559MNo5uc0IZiGUbccNF1elf0hQL1PmZhEXBxOTnxfDqfk3s+3TC6yQltKHajaaSzE/F07do1+6L7//7f/1OnTp246KLQ3DCCY+DAgT7t5/TbIE7ghvPpfIE6ggOBo1iGES66gcEJ9zF9RcDF5eTW88mJPZ9uGN3khDYUyzCCwOCmqfkJuLicOJ8Chxvm4HFCGwgjAIDLzk09n26YZDLQ20AYAQBcdm7q+bzwhX+hoaFKSkpS7dq1LVfmu0BvQ7EcTYPA4IT7mAAKxwkjOArLDf+HD7Q2EEZgjRtnmgTgPm6YDiLQ28BtGlgV6PcxARSOm3o+3TC6KdDbQM8IrHLbTJMAznJTz6cbJpkM9DbQMwKrLnyoKiQkRNu2bQuYh6oAFB49n/BVkO0CULwF+n1MAIVHzyd8xW0aWMX7UIDig4545IUwAqsC/T4mgMKj5xO+4pkRAMAVEegjOBA46BkBAFwR9HzCV/SMAAAAqxhNAwAArCKMAAAAqwgjAADAKsIIAACwijACAACsIowAAACrCCMAAMCq/w88uqkaYYVZSwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "file_info.filetype.value_counts().plot.bar()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c756d12e",
   "metadata": {},
   "source": [
    "You can also use this utility to find the average file size of documents in a directory, grouped by filetype."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a600fb0f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>filesize</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>filetype</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>FileType.DOCX</th>\n",
       "      <td>36602.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.EML</th>\n",
       "      <td>149088.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.HTML</th>\n",
       "      <td>1228404.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.JPG</th>\n",
       "      <td>64002.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.PDF</th>\n",
       "      <td>2429245.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.PPTX</th>\n",
       "      <td>38412.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.TXT</th>\n",
       "      <td>619.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.UNK</th>\n",
       "      <td>1102.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.XLSX</th>\n",
       "      <td>4765.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FileType.XML</th>\n",
       "      <td>713.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                filesize\n",
       "filetype                \n",
       "FileType.DOCX    36602.0\n",
       "FileType.EML    149088.5\n",
       "FileType.HTML  1228404.0\n",
       "FileType.JPG     64002.5\n",
       "FileType.PDF   2429245.0\n",
       "FileType.PPTX    38412.0\n",
       "FileType.TXT       619.0\n",
       "FileType.UNK      1102.0\n",
       "FileType.XLSX     4765.0\n",
       "FileType.XML       713.5"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "file_info.groupby(\"filetype\").mean(numeric_only=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "870ede91",
   "metadata": {},
   "source": [
    "If you want to pass in a list of filenames instead of a directory, use `get_file_info` instead of `get_directory_file_info`, as seen in the workflow below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e5e3a24d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from unstructured.file_utils.exploration import get_file_info\n",
    "\n",
    "filenames = [os.path.join(EXAMPLE_DOCS_DIRECTORY, f) for f in os.listdir(EXAMPLE_DOCS_DIRECTORY)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d8e59472",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-html.html',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/example-10k.html',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/factbook.xml',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-email-header.eml',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake.docx',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-email-image-embedded.eml',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-text.txt',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/layout-parser-paper-fast.pdf',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/email-with-image.eml',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/layout-parser-paper-fast.jpg',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-power-point.pptx',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-email.txt',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/README.md',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/factbook.xsl',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-excel.xlsx',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-email.eml',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/layout-parser-paper.pdf',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/fake-email-attachment.eml',\n",
       " '/Users/mrobinson/repos/unstructured/examples/training/../../example-docs/example.jpg']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filenames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cb0add28",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "MIME type was message/rfc822. This file type is not currently supported in unstructured.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "FileType.EML     4\n",
       "FileType.TXT     3\n",
       "FileType.HTML    2\n",
       "FileType.XML     2\n",
       "FileType.PDF     2\n",
       "FileType.JPG     2\n",
       "FileType.UNK     1\n",
       "FileType.DOCX    1\n",
       "FileType.PPTX    1\n",
       "FileType.XLSX    1\n",
       "Name: filetype, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "file_info = get_file_info(filenames=filenames)\n",
    "file_info.filetype.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac4473b0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
